skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Stanford, Caleb"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The Rust programming language is a prominent candidate for a C and C++ replacement in the memory-safe era. However, Rust’s safety guarantees do not in general extend to arbitrary third-party code. The main purpose of this short paper is to point out that this is true even entirely within safe Rust – which we illustrate through a series of counterexamples. To complement our examples, we present initial experimental results to investigate: do existing program analysis and program veri!cation tools detect or mitigate these risks? Are these attack patterns realizable via input to publicly exposed functions in real-world Rust libraries? And to what extent do existing supply chain attacks in Rust leverage similar attacks? All of our examples and associated data are available as an open source repository on GitHub. We hope this paper will inspire future work on rethinking safety in Rust – especially, to go beyond the safe/unsafe distinction and harden Rust against a stronger threat model of attacks that can be used in the wild. 
    more » « less
  2. Code completions produced by today’s large language models (LLMs) offer no formal guarantees. We propose proof-carrying code completions (𝑃𝐶3). In this paradigm, a high-resourced entity (the LLM provided by the server) must provide a code completion to- gether with a proof of a chosen safety property which can be inde- pendently checked by a low-resourced entity (the user). In order to provide safety proofs without requiring the user to write specifica- tions in formal logic, we statically generate preconditions for all dangerous function calls (i.e., functions that may violate the safety property) which must be proved by the LLM. To demonstrate the main ideas, we provide a prototype imple- mentation in the program verification language Dafny, and a case study focusing on file system vulnerabilities. Unlike Python code generated by GPT-4, Dafny code generated by 𝑃𝐶3 provably avoids a common weakness related to path traversal (CWE-35), using a single generation attempt (𝑘= 1)and a modest number of to- kens (3,350). Our tool is available as an open source repository at https://github.com/DavisPL/PCCC. 
    more » « less
  3. Zhang, Danfeng; Krishnaswami, Neel (Ed.)
    Over the last several years, the Rust programming language has gathered a following among software developers for its robust memory safety features. Nevertheless, it remains susceptible to potentially harmful side effects in untrusted code and is therefore vulnerable to supply chain attacks. We wish to investigate whether preventing them by retroactively enforcing side effect safety is possible. In this extended abstract, we introduce Coenobita, a Rust library that prevents undesirable side effects using capabilities without additional performance overhead. Our goal was to implement statically enforced, zero-cost, and unobtrusive capabilities. To evaluate Coenobita’s practicality and effectiveness, we conducted two case studies porting popular Rust crates walkdir and remove_dir_all to Coenobita. Porting walkdir required modifying or adding around 242 lines across three files originally containing 1800 lines total. Benchmarks were run on 46 tests provided in walkdir and their equivalents in coenobita-walkdir, demonstrating little change in runtime for most tests. 
    more » « less
  4. Motivated by distributed data processing applications, we introduce a class of labeled directed acyclic graphs constructed using sequential and parallel composition operations, and study automata and logics over them. We show that deterministic and non-deterministic acceptors over such graphs have the same expressive power, which can be equivalently characterized by Monadic Second-Order logic and the graded µ-calculus. We establish closure under composition operations and decision procedures for membership, emptiness, and inclusion. A key feature of our graphs, calledsynchronized series-parallel graphs(SSPG), is that parallel composition introduces a synchronization edge from the newly introduced source vertex to the sink. The transfer of information enabled by such edges is crucial to the determinization construction, which would not be possible for the traditional definition of series-parallel graphs. SSPGs allow both ordered ranked parallelism and unordered unranked parallelism. The latter feature means that in the corresponding automata, the transition function needs to account for an arbitrary number of predecessors by counting each type of state only up to a specified constant, thus leading to a notion ofcounting complexitythat is distinct from the classical notion of state complexity. The determinization construction translates a nondeterministic automaton withnstates andkcounting complexity to a deterministic automaton with 2n2states andkncounting complexity, and both these bounds are shown to be tight. Furthermore, for nondeterministic automata a bound of 2 on counting complexity suffices without loss of expressiveness. 
    more » « less
  5. null (Ed.)
  6. null (Ed.)
  7. Distributed architectures for efficient processing of streaming data are increasingly critical to modern information processing systems. The goal of this paper is to develop type-based programming abstractions that facilitate correct and efficient deployment of a logical specification of the desired computation on such architectures. In the proposed model, each communication link has an associated type specifying tagged data items along with a dependency relation over tags that captures the logical partial ordering constraints over data items. The semantics of a (distributed) stream processing system is then a function from input data traces to output data traces, where a data trace is an equivalence class of sequences of data items induced by the dependency relation. This data-trace transduction model generalizes both acyclic synchronous data-flow and relational query processors, and can specify computations over data streams with a rich variety of partial ordering and synchronization characteristics. We then describe a set of programming templates for data-trace transductions: abstractions corresponding to common stream processing tasks. Our system automatically maps these high-level programs to a given topology on the distributed implementation platform Apache Storm while preserving the semantics. Our experimental evaluation shows that (1) while automatic parallelization deployed by existing systems may not preserve semantics, particularly when the computation is sensitive to the ordering of data items, our programming abstractions allow a natural specification of the query that contains a mix of ordering constraints while guaranteeing correct deployment, and (2) the throughput of the automatically compiled distributed code is comparable to that of hand-crafted distributed implementations. 
    more » « less
  8. In real-time decision making and runtime monitoring applications, declarative languages are commonly used as they facilitate modular high-level specifications with the compiler guaranteeing evaluation over data streams in an efficient and incremental manner. We introduce the model ofData Transducersto allowmodularcompilation of queries over streaming data. A data transducer maintains a finite set of data variables and processes a sequence of tagged data values by updating its variables using an allowed set of operations. The model allows unambiguous nondeterminism, exponentially succinct control, and combining values from parallel threads of computation. The semantics of the model immediately suggests an efficient streaming algorithm for evaluation. The expressiveness of data transducers coincides withstreamable regular transductions, a robust and streamable class of functions characterized by MSO-definable string-to-DAG transformations with no backward edges. We show that the novel features of data transducers, unlike previously studied transducers, make them as succinct as traditional imperative code for processing data streams, but the structuring of the transition function permits modular compilation. In particular, we show that operations such as parallel composition, union, prefix-sum, and quantitative analogs of combinators for unambiguous parsing, can be implemented by natural and succinct constructions on data transducers. To illustrate the benefits of such modularity in compilation, we define a new language for quantitative monitoring, QRE-Past, that integrates features of past-time temporal logic and quantitative regular expressions. While this combination allows a natural specification of a cardiac arrhythmia detection algorithm in QRE-Past, compilation of QRE-Past specifications into efficient monitors comes for free thanks to succinct constructions on data transducers. 
    more » « less